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(57) Abstract: The present invention relates 
generally to computer software, and more 
specifically, to a method and system of making 
computer software resistant to tampering and 
reverse-engineering. "Tampering" occurs when an 
attacker makes unauthorized changes to a computer 
software program such as overcoming password 
access, copy protection or timeout algorithms. 
Broadly speaking, the method of the invention is 
to increase the tamper-resistance and obscurity of 
computer software code by transforming the data 
flow of the computer software so that the observable 
operation is dissociated from the intent of the 
original software code. This way, the attacker can not 
understand and decode the data flow by observing 
the execution of the code. A number of techniques 
for perforrrung the invention are given, including 
encoding software arguments using polynomials, 
prime number residues, converting variables to new 
sets of boolean variables, and defining variables on 
a new n-dimensional vector space. 



WO 00/77597 Al I IIIH II Hill 1 HI I Ml ■ II II 1 1 1 1 i 1 1 IMI I il 1 1 1 II I :il ! I IN II II II lllil III i II [HI 



patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European 
patent (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, 
IT, LU, MC, NL, PT, SE), OAPI patent (BF, BJ, CF, CG, 
CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG). 



For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



Published: 

— With international search report. 



WO 00/77597 



PCT/CA00/00678 



- 1 - 

Tamper Resistant Software Encoding 

The present invention relates generally to computer software, and more 
specifically, to a method and system of making computer software resistant to 
tampering and reverse-engineering. 

5 

Background of the Invention 

The market for computer software in all of its various forms is recognized to 
be very large and is growing everyday. In industrialized nations, hardly a business 
exists that does not rely on computers and software either directly or indirectly, in 

10 their daily operations. As well, with the expansion of powerful communication 
networks such as the Internet, the ease with which computer software may be 
exchanged, copied and distributed is also growing daily. 

With this growth of computing power and communication networks, a user's 
ability to obtain and run unauthorized or unlicensed software is becoming less and 

1 5 less difficult, and a practical means of protecting such computer software has yet to 
be devised. 

Computer software is generally written by software developers in a high-level 
language which must be compiled into low-level object code in order to execute on a 
computer or other processor. 

20 High-level computer languages use command wording that closely mirrors 

plain language, so they can be easily read by one skilled in the art. Typically, source 
code files have a suffix that identifies the corresponding language. For example, 
Java™ is a currently popular high-level language and its source code typically carries 
a name such as "prog 1 Java". High-level structure refers to, for example, the class 

25 hierarchy of object oriented programs, or the module structure in Ada™ programs. 

Object-code generally refers to machine-executable code, which is the output 
of a software compiler that translates source code from human-readable to machine- 
executable code. In the case of Java™, there is one file per class and the files have 
names such as "className. class", where "className" is the name of the class. 

30 Such files are generally called ".class files". 

The low-level structure of object code refers to the actual details of how the 
program works. Low-level analysis usually focuses on, or at least begins with, one 
routine at a time. This routine may be, for example, a procedure, function or 
method. Analysis of individual routines may be followed by analyses of wider scope 

35 in some compilation tool sets. 

SUBSTITUTE SHEET (RULE 26) 
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The low-level structure of a software program is usually described in terms of 
its data flow and control flow. Data-flow is a description of the variables together with 
the operations performed on them. Control-flow is a description of how control jumps 
from place to place in the program during execution, and the tests that are performed 
5 to determine those jumps. 

Tampering refers to changing computer software in a manner that is against 
the wishes of the original author. Traditionally, computer software programs have 
had limitations encoded into them, such as requiring password access, preventing 
copying, or allowing the software only to execute a predetermined number of times or 
10 for a certain duration. However, because the user has complete access to the 

software code, methods have been found to identify the code administering these 
limitations. Once this coding has been identified, the user is able to overcome these 
programmed limitations by modifying the software code. 

Since a piece of computer software is simply a listing of data bits, ultimately, 
15 one cannot prevent attackers from making copies and making arbitrary changes. As 
well, there is no way to prevent users from monitoring the computer software as it 
executes. This allows the user to obtain the complete data-flow and control-flow, so it 
was traditionally thought that the user could identify and undo any protection. This 
theory seemed to be supported in practice. This was the essence of the copy- 
20 protection against hacking war that was common on Apple-ll and early PC software, 
and has resulted in these copy-protection efforts being generally abandoned. 

Since then, a number of attempts have been made to prevent attacks by 
"obfuscating" or making the organisation of the software code more confusing and 
hence, more difficult to modify. Software is commercially available to "obfuscate" 
25 source in code in manners such as: 

globally replacing variable names with random character strings. For 
example, each occurrence of the variable name "SecurityCode" could be 
replaced with the character string "1xcd385mxc M so that it is more difficult for 
an attacker to identify the variables he is looking for; 
30 • deleting comments and other documentation; and 

removing source-level structural indentations, such as the indentation of loop 
bodies, to make the loops more difficult to read. 
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While these techniques obscure the source code, they do not make any 
attempts to deter modification. Once the attacker has figured out how the code 
operates, he is free to modify it as he choses. 

A more complex approach to obfuscation is presented in issued United States 
5 Patent No. 5,748,741 which describes a method of obfuscating computer software by 
artificially constructing a "complex wall". This "complex wall" is preferably a 
"cascade" structure, where each output is dependent on all inputs. The original 
program is protected by merging it with this cascade, by intertwining the two. The 
intention is to make it very difficult for the attacker to separate the original program 
1 0 from the complex wall again, which is necessary to alter the original program. This 
system suffers from several major problems: 

large code expansion, exceeding a hundred fold, required to create a 

sufficiently elaborate complex wall, and to accommodate its intertwining with 

the original code; and 

15 . low security since the obfuscated program may be divided into manageable 
blocks which may be de-coded individually, allowing the protection to be 
removed one operation at a time. 

Other researchers are beginning to explore the potential for obfuscation in 
ways far more effective than what is achieved by current commercial code 

20 obfuscators, though still inferior to the obfuscation of issued United States Patent No. 
5,748.741 . For example, in their paper "Manufacturing cheap, resilient, and stealthy 
opaque constructs", Conference on Principles of Programming Languages (POPL), 
1998 [ACM 0-89791-979-3/98/01], pp. 184-196, C. Collburg, C. Thomborson, and D. 
Low propose a number of ways of obscuring a computer program. In particular, 

25 Collburg et ai disclose obscuring the decision process in the program, that is, 

obscuring those computations on which binary or multiway conditional branches 
determine their branch targets. Clearly, there are major deficiencies to this approach, 
including: 

because only control-flow is being addressed, domain transforms are not 
30 used and data obfuscation is weak; and 

there is no effort to provide tamper-resistance. In fact, Collburg et ai do not 
appear to recognize the distinction between tamper-resistance and 
obfuscation, and as a result, do not provide any tamper-proofing at all. 



WO 00/77597 PCT/CA00/00678 

-4 - 

The approach of Collburg et al. is based on the premise that obfuscation can 
not offer a complete solution to tamper protection. Collburg et al. state that: "... code 
obfuscation can never completely protect an application from malicious reverse- 
engineering efforts. Given enough time and determination. Bob will always be able to 
dissect Alice s application to retrieve its important algorithms and data structures." 

As noted above, it is desirable to prevent users from making small, 
meaningful changes to computer programs, such as overriding copy protection and 
timeouts in demonstration software. It is also necessary to protect computer software 
against reverse engineering which might be used to identify valuable intellectual 
property contained within a software algorithm or model. In hardware design, for 
example, vendors of application specific integrated circuit (ASIC) cell libraries often 
provide precise software models corresponding to the hardware, so that users can 
perform accurate system simulations. Because such a disclosure usually provides 
sufficient detail to reveal the actual cell design, it is desirable to protect the content of 
the software model. 

In other applications, such as emerging encryption and electronic signature 
technologies, there is a need to hide secret keys in software programs and 
transmissions, so that software programs can sign, encrypt and decrypt transactions 
and other software modules. At the same time, these secret keys must be protected 
against being leaked. 

There is therefore a need for a method and system of making computer 
software resistant to tampering and reverse engineering. This design must be 
provided with consideration for the necessary processing power and real time delay 
to execute the protected software code, and the memory required to store it. 

Summary of the Invention 

It is therefore an object of the invention to provide a method and system of 
making computer software resistant to tampering and reverse engineering which 
addresses the problems outlined above. 

The method and system of the invention recognizes that attackers cannot be 
prevented from making copies and making arbitrary changes. However, the most 
significant problem is u useful tampering" which refers to making small changes in 
behaviour. For example, if the trial software was designed to stop working after ten 
invocations, tampering that changes the il ten : ' to "hundred" is a concern, but 
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tampering that crashes the program totally is not a priority since the attacker gains no 
benefit. 

Data-flow describes the variables together with operations performed on 
them. The invention increases the complexity of the data-flow by orders of 
5 magnitude, allowing "secrets" to be hidden in the program, or the algorithm itself to be 
hidden. "Obscuring" the software coding in the fashion of known code obfuscators is 
not the primary focus of the invention. Obscurity is necessary, but not sufficient for. 
achieving the prime objective of the invention, which is tamper-proofing. 

One aspect of the invention is broadly defined as a method of increasing the 
1 0 tamper-resistance and obscurity of computer software code comprising the steps of 
transforming the data flow in the computer software code to dissociate the observable 
operation of the transformed the computer software code from the intent of the 
original software code. 

A second aspect of the invention is broadly defined as a method of increasing 
1 5 the tamper-resistance and obscurity of computer software code comprising the steps 
of encoding the computer software code into a domain which does not have a 
corresponding semantic structure, to increase the tamper-resistance and obscurity of 
the computer software code. 

A further aspect of the invention is defined as a computer readable memory 
20 medium, storing computer software code executable to perform the steps of: 

compiling the computer software program from source code into a corresponding set 
of intermediate computer software code; encoding the intermediate computer 
software code into tamper-resistant intermediate computer software code having a 
domain which does not have a corresponding semantic structure, to increase the 
25 tamper-resistance and obscurity of the computer software code; and compiling the 

tamper-resistant intermediate computer software code into tamper-resistant computer 
software object code. 

An additional aspect of the invention is defined as a computer data signal 
embodied in a carrier wave, the computer data signal comprising a set of machine 
30 executable code being executable by a computer to perform the steps of: compiling 
the computer software program from source code into a corresponding set of 
intermediate computer software code; encoding the intermediate computer software 
code into tamper-resistant intermediate computer software code having a domain 
which does not have a corresponding semantic structure, to increase the tamper- 
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resistance and obscurity of the computer software code; and compiling the tamper- 
resistant intermediate computer software code into tamper-resistant computer 
software object code. 

Another aspect of the invention is defined as an apparatus for increasing the 
5 tamper-resistance and obscurity of computer software code, comprising: front end 
compiler means for compiling the computer software program from source code into 
a corresponding set of intermediate computer software code; encoding means for 
encoding the intermediate computer software code into tamper-resistant intermediate 
computer software code having a domain which does not have a corresponding 
1 0 semantic structure, to increase the tamper-resistance and obscurity of the computer 
software code; and back end compiler means for compiling the tamper-resistant 
intermediate computer software code into tamper-resistant computer software object 
code. 



1 5 Brief Description of the Drawings 

These and other features of the invention will become more apparent from the 
following description in which reference is made to the appended drawings in which: 
Figure 1 presents an exemplary computer system in which the invention may be 

embodied; 

20 Figure 2 presents a flow chart of the invention applied to a software compiler in an 
embodiment of the invention; 
Figure 3 presents a flow chart of a general algorithm for implementation of the 
invention; 

Figure 4 presents a flow chart of a null coding routine in an embodiment of the 
25 invention; 

Figure 5 presents a flow chart of a polynomial encoding routine in an embodiment of 
the invention; 

Figure 6 presents a flow chart of a residue number encoding routine in an 
embodiment of the invention; 
30 Figure 7 presents a flow chart of a bit-explosion coding routine in an embodiment of 
the invention; 

Figure 8 presents a flow chart of a custom base coding routine in an embodiment of 
the invention; and 

Figure 9a and 9b present a flow chart of the preferred embodiment of the invention. 
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Detailed Description of Preferred Embodiments of the Invention 

The invention lies in a means for recoding software code in such a manner 
that it is fragile to tampering. Attempts to modify the software code will therefore 
cause it to become inoperable in terms of its original function The tamper-resistant 
5 software may continue to run after tampering, but no longer performs sensible 
computation. 

The extreme fragility embedded into the program by means of the invention 
does not cause execution to cease immediately, once it is subjected to tampering. It 
is desirable for the program to continue running so that, by the time the attacker 
10 realizes something is wrong, the modifications and events which caused the 

functionality to become nonsensical are far in the past This makes it very difficult for 
the attacker to identify and remove the changes that caused the failure to occur. 

An example of a system upon which the invention may be performed is 
presented as a block diagram in Figure 1 This computer system 10 includes a 
15 display 12, keyboard 14, computer 16 and external devices 18. 

The computer 16 may contain one or more processors or microprocessors, 
such as a central processing unit (CPU) 20. The CPU 20 performs arithmetic 
calculations and control functions to execute software stored in an internal memory 
22, preferably random access memory (RAM) and/or read only memory (ROM), and 
20 possibly additional memory 24. The additional memory 24 may include, for example, 
mass memory storage, hard disk drives, floppy disk drives, magnetic tape drives, 
compact disk drives, program cartridges and cartridge interfaces such as those found 
in video game devices, removable memory chips such as EPROM or PROM, or 
similar storage media as known in the art. This additional memory 24 may be 
25 physically internal to the computer 16, or external as shown in Figure 1. 

The computer system 10 may also include other similar means for allowing 
computer programs or other instructions to be loaded. Such means can include, for 
example, a communications interface 26 which allows software and data to be 
transferred between the computer system 10 and external systems. Examples of 
30 communications interface 26 can include a modem, a network interface such as an 
Ethernet card, a serial or parallel communications port. Software and data 
transferred via communications interface 26 are in the form of signals which can be 
electronic, electromagnetic, optical or other signals capable of being received by 
communications interface 26. 
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input and output to and from the computer 16 is administered by the 
input/output (I/O) interface 28 This I/O interface 28 administers control of the display 
12, keyboard 14, external devices 18 and other such components of the computer 
system 10. 

The invention is described in these terms for convenience purposes only. It 
would be clear to one skilled in the art that the invention may be applied to other 
computer or control systems 10. Such systems would include all manner of 
appliances having computer or processor control including telephones, cellular 
telephones, televisions, television set top units, lap top computers, personal digital 
assistants and automobiles. 



Compiler Technology 

In the preferred embodiment, the invention is implemented in terms of an 
intermediate compiler program running on a computer system 10. Standard 
compiler techniques are well known in the art. Two standard references which may 
provide necessary background are "Compilers Principles, Techniques, and Tools" 
1988 by Alfred Aho, Ravi Sethi and Jeffrey Ullman (ISBN 0-201-1008-6), and 
"Advanced Compiler Design & Implementation" 1997 by Steven Muchnick (ISBN 1- 
55860-320-4). The preferred embodiment of the invention is described with respect 
to static single assignment, which is described in Muchnick. 

Figure 2 presents an example of such an implementation in a preferred 
embodiment of the invention. Generally, a software compiler is divided into three 
components, described as the front end, the middle, and the back end. The front end 
30 is responsible for language dependent analysis, while the back end 32 handles 
the machine-dependent parts of code generation. Optionally, a middle component 
may be included to perform optimizations that are independent of language and 
machine. Typically, each compiler family will have only one middle, with a front end 
30 for each high-level language and a back end 32 for each machine-level language. 
All of the components in a compiler family can generally communicate in a common 
intermediate language so they are easily interchangeable. 

The first component of the software compiler is a front end 30, which receives 
source code, possibly in a high-level language and generates what is commonly 
described as internal representation or intermediate code. There are many such 
compiler front ends 30 known in the art. 
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In the preferred embodiment of the invention, this intermediate code is then 
encoded to be tamper-resistant by the middle compiler 34 of the invention to make 
the desired areas of the input software tamper-resistant. The operation of the 
invention in this manner will be described in greater detail hereinafter. 
5 Finally, the compiler back end 32 receives the tamper-resistant intermediate 

code and generates object code. The tamper-resistant object code is then available 
to the user to link and load, thereby creating an executable image of the source code 
for execution on a computer system 10. 

The use of compiler front ends 30 and back ends 32 is well known in the art. 

10 Typically, these compiler components are commercially available "off the shelf, 
although this is not yet the case for Java™, and are suited to particular computer 
software and computers. For example, if a Compiler Writer wishes to compile a C++ 
program to operate on a 486 microprocessor, he would pair a front end 30 which 
compiles high level C++ into intermediate code, with a back end 32 which compiles 

15 this intermediate code into object code executable on the 486 microprocessor. 

In the preferred embodiment of the invention, the tamper-resistant encoding 
compiler 34 is implemented with a front-end 30 that reads in Java™ class files and a 
back end 32 that writes out Java™ class files. However, the invention can easily be 
implemented using front ends 30 for different languages and machine binaries, and 

20 with back ends 32 for different machines or even de-compilers for various source 

languages. For example, it is likely that an embodiment will be brought to market to 
compile C source into tamper-resistant C source. Of course, one can also mix-and- 
match by reading Java™ class files and outputting C source, for example. 

In the preferred embodiment of the invention, a standard compiler front end 

25 30 is used to generate intermediate code in static single assignment form which 
represents the semantics of the program, however any similar semantic 
representation may be used. To better understand the invention, it is useful to 
describe some additional terminology relating to static single assignment. 

30 Static Single Assignment and Other Flow-Exposed Forms 

A middle compiler intended to perform optimization or other significant 
changes to the way computation is performed, typically uses a form which exposes 
both control- and data-flow so that they are easily manipulated. Such an intermediate 
form may be referred to as flow-exposed form. 



WO 00/77597 PCT/CA00/00678 

- 10 - 

In particular, Static Single Assignment (SSA) form is a well-known, popular 
and efficient flow-exposed form used by software compilers as a code representation 
for performing analyses and optimizations involving scalar variables. Effective 
algorithms based on Static Singie Assignment have been developed to address 
5 constant propagation, redundant computation detection, dead code elimination, 
induction variable elimination, and other requirements. 

Static single assignment is a fairly recent way of representing semantics that 
makes it easy to perform changes on the program. Converting to and from static 
single assignment is well understood and covered in standard texts like Muchnick. 
10 Many optimizations can be performed in static single assignment and can be simpler 
than the traditional non-static single assignment formulations. 

Basically, in static single assignment form, each variable is cloned a number 
of times, once for each assignment to that variable. This has the advantageous 
property that each Variable Register (VR) has exactly one place that assigns to it and 
1 5 the operations which consume the value from this particular assignment are exactly 
known. Each definition of a variable is given a unique version, and different versions 
of the same variable can be regarded as different program variables. Each use of a 
variable version can only refer to a single reaching definition. This yields an 
intermediate representation in which expressions are represented in directed acyclic 
20 graph (DAG) form, that is, in tree form, if there are no common subexpressions, and 
the expression DAGs are associated with statements that use their computed results. 

One important property in static single assignment form is that each definition 
dominates all of its uses in the control flow graph of the program, unless the use is a 
cp-assignment. A more detailed description of cp-assignments is given hereinafter. 
25 Another important property in static single assignment form is that identical 

versions of the same variable have the same value on any execution path starting 
with the initial assignment and not looping back to this assignment. Of course, 
assignments in loops may assign different values on different iterations, but the 
property just given still holds. 
30 When several definitions of a variable reach a merging node in the control 

flow graph of the program, a merge function assignment statement called a phi, or cp, 
assignment, is inserted to merge them into the definition of a new variable version. 
This merging is required to maintain the semantics of single reaching definitions. 



WO 00/77597 PCT/CA00/00678 

- 11 - 

Merge nodes are covered in the standard text books such as Muchnick and the 
present invention does not require them to be handled any differently. 

Of course, the method of the invention could be applied to flow-exposed forms 
other than SSA, where these provide similar levels of semantic information, as in that 
5 provided in Gnu CC. Gnu CC software is currently available at no cost from the Free 
Software Foundation. 

Similarly, the method of the invention could be applied to software in its high 
level or low level forms, if such forms were augmented with the requisite control- and 
data-flow information. This flexibility will become clear from the description of the 
10 encoding techniques described hereinafter. 

p re f era biy the method of the invention is implemented in the form of a 
conventional compiler computer program operating on a computer system 10 or 
similar data processor. As shown in Figure 3, the compiler program reads an input 
source program, such as a program written in the C++ programming language, 
15 stored in the memory 22 or mass storage 24, and creates a static single assignment 
intermediate representation of the source code in the memory 22 using the compiler 
front end 30. A simple example of this compiling into intermediate code follows. 

Code Block 1A shows a simple loop in the FORTRAN language, which could 
form a part of the source program input to the compiler front end 30. Code Block 1 B 
20 is a static single assignment intermediate representation of code block 1 A output 
from the compiler front end 30. In static single assignment, each virtual register 
appears in the program exactly once on the left-hand side of an assignment. The 
label t is used herein to intentionally correspond to the virtual register names of Code 
Blocks below. 

25 _ 

Code Block 1A Code Block 1B 

(FORTRAN Loop) (Static Single Assignment IR) 

% too = 0, t01 = 1 

% t02 = 5, t03 = 50 



K = 0 


s0 


t04 


= copy(tOO) 


J = 1 


s1 


t05 


= copy (t0 1) 


DO 10 I = 1. 50 


s2 


t06 


= copy(t01) 




S10 


no 


= <p (t04. t14) 




S11 


tn 


= cp (t05. t13) 
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L — J 




t07 = rnnv/M 1 ^ 


1 = 1 + k 




t1 ^ = ipHHftl 1 t1 0) 


K = L 


s5 


t14 = copy(t07) 


10 CONTINUE 


s6 


t15 = iadd(t12,t01) 




s7 


t08 = i!e(t15,t03) 




s8 


brt(t08,s10) 


K = J + 5 


s9 


t9 = iadd(t13,t02) 



1 0 Except for the initialization steps in the first five lines, each line of Code Block 

1B corresponds to a line of source code in Code Block 1A. The sources and 
destinations for all the operations are virtual registers stored in the memory and 
labelled t1 to t1 0. The "iadd" instructions of the above Code Blocks represent CPU 
integer add operations, the "ile" instruction is an integer less-than-or-equal-to 

15 comparison, the "brt" instruction is a "branch if true" operation. Merge nodes are 

represented by the cp function in the intermediate code statements s10 t s1 1 and s12. 
The loop of Code Block 1 A requires that the backward branch at s8 use the 
statement number s10 to reference the head of the loop. 

20 Use of Optimizers 

Since the invention alters the organization of the software program beyond 
understanding, a lot of optimization techniques will become ineffective. Therefore, 
any desired optimization should be done before the tamper-resistant compiling 36 in 
Figure 3. Performing optimization after would require the tamper-resistant compiling 
25 routine to leave special coding to ensure that the optimization routine does not alter 
or remove essential coding. This would require a lot of additional code, and would be 
error-prone. This would also require a new optimization algorithm, as current 
algorithms do not take account of the special coding. Also, existing analysis 
techniques such as Data-Flow-Analysis and Alias Analysis may be used to guide the 
30 choice of coding scheme by replacing 'worst-case' data-flow connectivity with 

connectivity closer to reality, so that recodings to achieve matching codings are 
employed only where really needed. For example, Range Analysis done as part of 
Data-Flow Analysis can be used to determine how large the bases used in the 
Residue Number Coding need to be. 
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General Implementation of Tamper-Resistant Compiling 

The tamper-resistant encoding compiler 34 of the invention receives and 
analyses the internal representation of Code Block 1 B. Based on its analysis, the 
5 tamper-resistant encoding compiler 34 restructures portions of the intermediate 
representation, thereby making it fragile to tampering. 

In general, the tamper-resistant encoding compiler 34 performs three passes 
of the intermediate code graph for each phase of encoding, shown in Figure 3 as 
steps 38 through 46. In the preferred embodiment of the invention, the popular 
1 0 practice of dividing the compiler into a number of phases, several dozen, in fact, is 
being followed. Each phase reads the static single assignment graph and does only 
a little bit of the encoding, leaving a slightly updated static single assignment graph. 
This makes it easier to understand and to debug. A "phase control file" may be used 
to specify the ordering of the phases at step 38 and particular parameters of each 
15 phase, for added flexibility in ordering phases. This is particularly useful when one 
phase is to be tested by inserting auditing phases before and/or after it, or when 
debugging options are added to various phases to aid debugging. 

Whenever variable codings are chosen, three passes of the intermediate 
code graph are generally required. In a first pass, at step 40. the tamper-resistant 
20 encoding compiler 34 walks the SSA graph and develops a proposed system of re- 
codings. If the proposed codings are determined to be acceptable at step 42, which 
may require a second pass of the SSA graph, control proceeds to step 44, where the 
acceptable re-codings are then made in a third pass. If the proposed coding is found 
to contain mismatches at step 42, then recodings are inserted as needed to eliminate 
25 the mismatches at step 46. 

Once all of the encoding phases have been executed, the resulting tamper- 
resistant intermediate code is then compiled into object code for storage or machine 
execution by the compiler back end 32. 

This hardening of software has traditionally been thought to be impossible. 
30 The usual reasoning is that the attacker can "watch" the program execute, thereby 
obtaining the complete data-flow and control-flow, so the attacker can undo any 
protection. 

Existing obfuscation techniques do not offer effective protection because they 
do not hide how the program actually runs. Therefore, existing decompiling tools 
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may observe the software execution and point to the code that the attacker wishes to 
modify The invention, however, decouples or dissociates the actual, observable 
operation from the corresponding software code so that the attacker may not find the 
corresponding code. This is done by transforming the domain of the data flow into a 
5 new domain which does not have a corresponding high level semantic structure. 

This new method makes it very difficult to fix any reference points for variables in the 
tamper-resistant program as everything has multiple interpretations. 

Because of this dissociation, the invention may be applied to small areas of 
the input software code. In a typical application, much of the executable code need 
10 not be made tamper-resistant since there is no need for it to be secure from 

tampering. For example, encoding software which creates a bit-mapped graphical 
user interface (GUI) would be pointless as the information it conveys is immediately 
evident to the user. 

Obfuscation relies solely on "hiding" the organization of the computer software 

15 for protection. Existing obfuscators are weak, so a larger portion of the source code 
must be obfuscated to ensure that some degree of obscurity is achieved in the area 
of the program requiring protection. The invention, in contrast, provides strong 
obfuscation, and resists tampering both by obscurity and by extreme induced fragility. 
Therefore, the invention need only encode the area of the program requiring 

20 protection. 

This allows the invention to be far more efficient in terms of memory, 
processing power and execution time. For example, if the source code requires 1 
megabyte of memory, but al! of the security measures reside in a 5 kilobyte block, 
tamper-resistant encoding that 5 kilobyte block by an order of 20 times, in the manner 

25 of the invention, will only increase the overall size of the input software program by 
10%, from 1 megabyte to 1 .1 megabytes. In contrast, if it were necessary to apply 
the process of the invention to all of the source code, a program size increase of 
2000%, to 20 megabytes, would result. 

The method and system of the invention recognizes that one cannot prevent 

30 attackers from making copies and making arbitrary changes. However, the most 
significant problem is ' useful tampering" which refers to making small changes in 
behaviour. For example, if the trial software was designed to stop working after ten 
invocations, tampering that changes the "ten" to ''hundred" is a concern, but 
tampering that crashes the program totally is not important. 
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In operation, the tamper-resistant encoding technique of the invention wiii 
work much like a compiler from the user's point of view, although the internal 
operations are very different, users may start with a piece of software that is already 
debugged and tested, run that software through the invention software and end up 
5 with new tamper-resistant software. The new tamper-resistant software still appears 
to operate in the same manner as the original software but it is now hardened against 
tampering. 

Wide Applications 

Tamper-resistant encoding in a manner of the invention has very wide 
possible uses: 

1 . Protecting the innovation of a software algorithm. For example, if one wished 
to sell software containing a new and faster algorithm to solve the linear 
programming problem, one would like to sell the software without disclosing 
the method. 

2. Protecting the innovation of a software model. In hardware design, it is 
common for vendors of ASIC cell libraries to provide precise software models 
so that users can perform accurate system simulations. However, it would be 
desirable to do so without giving away the actual cell design. 

3. Wrapping behaviour together. Often, it is desirable to write some software 
that will perform a function "A" if and only if an event "B" occurs. For 
example, a certain function is performed only if payment is made. 

4. Hiding secrets, such as adding encryption keys or electronic signatures into a 
program, so that the program can sign things and encrypt/decrypt things, 
without leaking the key. 

Clearly, there are other applications and combinations of applications. For 
example, an electronic key could be included in a decoder program and the decoding 
tied to electronic payment, thereby providing an electronic commerce solution. 

30 Properties of Tamper-Resistance 

The general approach is that each variable in the software program being 
encoded, is mapped to some new set of variables, which is cleverly chosen to be not 
easily reversible to the original. Then, all the arithmetic is performed in the domain of 
the new set of variables when the program executes. 
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A number of different techniques are presented herein for effecting this 

tamper-resistant encoding, which are described as null, polynomial, residual number. 

bit-exploded, bit-tabulated and custom base coding. These techniques may be 

applied using a large number of possible codings, as well, these and other coding 
5 techniques can be combined. For example, after using the residue number 

technique, each of the resulting components can be further encoded using the 

polynomial technique. 

These techniques are presented as examples of how the invention may be 

embodied, and one skilled in the art would be able to identify other similar techniques 
1 0 for effecting the invention. These techniques may be described in terms of the 

following properties: 

1. Anti-hologram 

This property is the opposite of that demonstrated by a hologram. If a piece 
of a hologram is removed, the whole image of the hologram will still be visible. 
15 but in reduced resolution. In contrast, the invention disperses the definition of 

a single variable into several locations so that a single modification or deletion 
in any one of those locations will corrupt its value. This property is desirable 
as it magnifies the detrimental effects of any tampering. Of the techniques 
described herein, Residue Number, Bit-Explosion and Custom Base have this 
20 property. 

For example, the assignment: 
x := 2y + z 

may be encoded with three different equations stored in different areas of the 
program as the following assignments: 
y + 2z 
y-z 
a + b 

In this simple example, the value of variable x will be modified if any of three 
assignments is modified. 
30 Generally, the sequence in which the assignments are made is significant and 

is taken into consideration while variables and assignments in the input 
software program are being encoded. 
2. Fake-robustness 
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This property describes tamper-resistant encoding which allows a given 
value to be changed and have the program still run, but because only a 
limited set of values are actually sensible, certain values will eventually lead 
to nonsensical computation. It is desirable to maximize the number of 
encodings that are fake-robust so that a tampered program will not 
immediately crash. This makes debugging more difficult as the attacker will 
have to analyse a larger region of code for any change. All of the coding 
techniques described above, except for the null technique, demonstrate this 
property. 

In the context of tamper-resistance, true robustness would allow the code to 
be modified somewhat with no change in semantics. Fake robustness does 
not preserve the semantics of the original software program when modified, 
so the modified software program continues to execute, but eventually will 
crash. 

For example, if an array A is known to have 100 elements, then converting the 
expression A [i] to the expression A [i mod 100] makes it fake-robust in that 
variable / may take on any value and not cause an array bounds error. 
However, certain values of variable / may cause nonsensical operation 
elsewhere in the program without causing a complete failure. 
3. Togetherness property 

In terms of encoded data flow, this property describes a scenario in which the 
definition of several variables, preferably from different areas of the program, 
are arithmetically tied together. Therefore, an attacker cannot alter the 
encoded program by changing a single value in a single place. This 
increases the likelihood of any tampering causing a crash in a different area 
of the program. The degree of togetherness of the coding techniques 
presented herein, is low with the Polynomial transform technique which has a 
"1 to 1" correspondence, moderate with Residue Numbers which has "1 to 
many" correspondence, and high with Custom Base which has a "many to 
many" correspondence. Note that the degree of togetherness for the 
polynomial encoding can be increased by splitting up equations as previously 
shown. 

For example, original variables x, y and z, may be encoded into f, u, and v 
using appropriate functions: 
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t = F^x, y, z) 
u = F 2 (x, y, z) 
v = F 3 (x, y, z) 

which may be decoded to yield the values of variables x, y and z, using a 
5 complementary set of appropriate functions: 

x = G,(f, u, v) 
y = G 2 (f, tv, 
z = G 3 (f, w, i/) 

When described in terms of these three properties, the custom base 
10 technique appears to offer the most tamper-resistant encoding. However, 

consideration should be made for the resulting impact on run-time expansion, code 
space expansion, complexity of implementation, probability of requiring recoding and 
other metrics. Recoding refers to the addition of RECODE operations where 
mismatches would otherwise occur between proposed encodings. 
15 Each coding technique will have different time/space/complexity trade-offs for 

different operations. For example, residue number coding can handle large numbers 
for addition, subtraction and multiplication, but can only handle very restricted forms 
of division. Most texts state that residue number division is impossible, but the 
invention applies a method of division where the divisor is part of the residue base. 
20 Several techniques for realizing the invention will now be described. 



Null Coding 

A null coding is one which does not affect the original software program, that 
is, the original variable is represented by the same value. There are many places in 

25 a program where encoding is not particularly advantageous, for example at the input 
and output points of the program. As the inputs and outputs may be monitored from 
a known position outside the program, they are easily identified by an attacker. 
Rather than addressing the complexity of encoding the inputs and outputs, with little 
return for the effort, it is more convenient to use a null coding. 

30 Null coding may be realized by adding a routine as shown in Figure 4, as one 

of the phases executed at step 38 of Figure 3. As the SSA graph is traversed, one 
analyses each variable, and at step 48, determines whether an identified variables is 
one which is not to be hidden. As noted above, this may include a variable which is 
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an input, output or otherwise pointless to hide. If so, then a null coding is performed 
at step 50. 

Also, as noted with regard to Figure 3, this null coding is recorded in the 
"phase control file". Therefore, if it is determined at step 48 that null coding is not 
5 required, the "phase control file" will be made aware at step 52, that another form of 
coding may be performed. 

By use of the decision block at step 54 and stepping through the lines of SSA 
code at step 56, the balance of the SSA graph is traversed. 



10 Polynomial Coding 

The polynomial encoding technique takes an existing set of equations and 
produces an entirely new set of equations with different variables. The variables in 
the original program are usually chosen to have meaning in the real world, while the 
new encoded variables will have no such meaning. As well, the clever selection of 
1 5 constants and polynomials used to define the new set of equations may allow the 
original mathematical operations to be hidden. 

This technique represents a variable x by some polynomial of x, such 
as ax + b where a and b are some random numbers. This technique allows 
us to hide operations by changing their sense, or to distribute the definition of 
20 a variable around in a program. 

A convenient way to describe the execution of the polynomial routine is in 
terms of a "phantom parallel program". As the polynomial encoding routine executes 
and encodes the original software program, there is a conceptual program running in 
parallel, which keeps track of the encodings and their interpretations. After the 
25 original software program has been encoded, this "phantom parallel program" adds 
lines of code which "decode" the output back to the original domain. 

For example, if the SSA graph defines the addition of two variables as: 
z := x-y (1) 
this equation may be hidden by defining new variables: 

ax + b (2) 
cy + d ( 3 ) 
ez + f (4) 
Next, a set of random values for constants a, b, c, of, e, and f is chosen, and the 
original equation (1) in the software program is replaced with the new equation (5). 
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Note that, in this case, the constant c is chosen to be equal to -a. which hides the 
subtraction operation from equation (1) by replacing it with an addition operation: 

z' := x' + y' (5) 
The change in the operation can be identified by algebraic substitution: 
5 z' := a(x-y) + (b + d) (6) 

Equation (5) is the equation that will replace equation (1) in the software 
program, but the new equations (2), (3) and (4) will also have to be propagated 
throughout the software program. If any conflicts arise due to mismatches, RECODE 
operations will have to be inserted to eliminate them. 
10 In generating the tamper-resistant software, the transformations of each 

variable are recorded so that all the necessary relationships can be coordinated in 
the program as the SSA graph is traversed. However, once all nodes of the SSA 
graph have been transformed and the "decoding" lines of code added at the end, the 
transformation data may be discarded, including equations (3), (4) and (5). That is, 
15 the "phantom parallel program" is discarded, so there is no data left which an attacker 
may use to reverse engineer the original equations. 

Note that a subtraction has been performed by doing an addition without 
leaving a negative operator in the encoded program. The encoded program only has 
a subtraction operation because the phantom program knows "c = -a". If the value of 
20 the constant had been assigned as u c = a", then the encoded equation would really 
be an addition. Also, note that each of the three variables used a different coding 
and there was no explicit conversion into or out of any encoding. 

For the case of: 

y := -x (7) 
25 one could chose: 

x' := ax + b, and (8) 
y' := (-a)y+to (9) 
which would cause the negation operation to vanish, and x and y to appear to be the 
same variable. The difference is only tracked in the interpretation. 
30 Similarly, for the case of : 

y := x + 5 (10) 

one could chose: 

y' := ax + (to + 5) (11) 
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causing the addition operation to vanish. Again, now there are two different 
interpretations of the same value. 

Figure 5 presents a simple implementation of the polynomial coding 
technique. At step 58, a line of the SSA graph is analysed to determine whether it 
5 defines a polynomial equation suitable for polynomial encoding. If so, a suitable set 
of polynomial equations is defined at step 60 that accomplishes the desired encoding. 
As noted above, this technique is generally applied to physically distribute the 
definition of a variable throughout a program so a single assignment is usually 
replaced by a system of assignments distributed throughout the program. 

1 0 For the simple polynomial scheme, the values of constants are generally 

unrestricted and the only concern is for the size of the numbers. Values are chosen 
which do not cause the coded program to overflow. In such a case, the values of 
constants in these equations may be selected randomly at step 62, within the 
allowable constraints of the program. However, as noted above, judicious selection 

15 of values for constants may be performed to accomplish certain tasks, such as 
inverting arithmetic operations. 

At the decision block of step 64 it is then determined whether the entire SSA 
graph has been traversed, and if not, the compiler steps incrementally to the next line 
of code by means of step 66. Otherwise, the phase is complete. 

20 Variations on this technique would be clear to one skilled in the art. For 

example, higher order polynomials could be used, or particular transforms developed 
to perform the desired hiding or inversion of certain functions. 

Residue Number Coding 

25 This technique makes use of the "Chinese Remainder Theorem" and is 

usually referred to as "Residue Numbers" in text books (see 'The Art of Computer 
Programming", volume 2: "Seminumerical Algorithms", 1997, by Donald E. Knuth, 
ISBN 0-201-89684-2, pp. 284-294, or see "Introduction to Algorithms", 1990, by 
Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest, ISBN 0-262-03141- 

30 8, pp. 823-826). A "base" is chosen, consisting of a vector of pairwise relatively 

prime numbers, for example: 3, 5 and 7. Then, each variable x is represented as a 
vector of remainders when this variable is operated upon by the "base", that is, x 
maps on to (x rem 3, x rem 5, x rem 7). 
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ln this scheme, a "Modular Base" consists of several numbers that are 
pairwise relatively prime. Two distinct integers are said to be relatively prime if their 
only common divisor is 1. A set of integers are said to be pairwise relatively prime, if 
for each possible distinct pair of integers from the set, the two integers of the pair are 
5 relatively prime. 

An example of such a set would be {3, 5, 7}. In this base, integers can be 
represented as a vector of remainders by dividing by the base. For example: 

0 = (0, 0, 0), 

1 = (1,1,1), 
10 5 (2,0,5), 

100 = (1.0. 2), and 
105 = (0,0,0). 

Note that this particular base {3, 5, 7} has a period of 105, which is equal to 
the product of 3 * 5 x 7, so that only integers inside this range may be represented. 
1 5 The starting point of the range may be chosen to be any value. The most useful 
choices in this particular example would be [0, 104] or [-52, 52]. 

If two integers are represented in the same base, simple arithmetic operations 
may be performed very easily. Addition, subtraction and multiplication for example, 
may be performed component wise in modular arithmetic. Again, using the base of 
20 {3, 5, 7}: 

if: 1 = (1,1,1) and 
5 = (2, 0, 5), then 

1 + 5 = ((1+2) mod 3,(1+0) mod 5, (1 + 5) mod 7) 

(0,1,6). 

25 Of course, 1+5 = 6, and 6 in residue form with the same base is (0, 1 , 6). 

Subtraction and multiplication are performed in a corresponding manner. 

Heretofore, division had been thought to be impossible, but can be done 

advantageously in a manner of the invention. First, however, it is of assistance to 

review the method of solving for the residue numbers. 
30 Converting from an integer to a corresponding Residue Number is simply a 

matter of dividing by each number in the base set to determine the remainders. 

However, converting from a Residue Number back to the original integer is more 

difficult. The solution as presented by Knuth is as follows. Knuth also discusses and 

derives the general solution, which will not be presented here: 
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For an integer "a" which may be represented by a vector of residue numbers 

(a„ a 2 , ... aj: 

a = (a, c T + a 2 c : + ... + a k c k ) (mod n) (12) 

where: 

a, = a (mod n,) for / = 1, 2, .... /c 



and: 



and: 



m, (rn; 1 mod n,) for / = 1 , 2 /c (13) 



10 and: 



m, = n /n. for/= 1, 2 /c (14) 

and where the notation "(x~ 1 mod y)" used above denotes that integer z such that 
xz (mod y) = 1 . For example, (3* 1 mod 7) = 5 because 1 5 (mod 7) = 1 , where 
15 = 3x5. 

1 5 In the case of this example, with a base (3, 5, 7), a vector of solution 

constants, (c3 = 70, c5 = 21, c7 = 15), are calculated. Once these constants have 
been calculated, converting a residue number (1, 1, 1) back to the original integer is 
simply a matter of calculating: 

r,c, + r 2 c 2 + r 3 c 3 =1x70+1x21 + 1x15 (15) 

20 = 106 

assuming a range of [0,104], multiples of 105 are subtracted yielding an integer value 
of 1. 

Most texts like Knuth discuss Residue Numbers in the context of hardware 
implementation or high-precision integer arithmetic, so their focus is on how to pick a 
25 convenient base and how to convert into and out of that base. However, in applying 
this technique to the invention, the concern is on how to easily create many diverse 
bases. 

In choosing a basis for Residue Numbers, quite a few magic coefficients may 
be generated dependent on the bases. By observation of the algebra, it is desirable 
30 to have different bases with a large number of common factors. This can be easily 
achieved by having a list of numbers which are pairwise relatively prime, and each 
base just partitions these numbers into the components. For example, consider the 
set {16, 9, 5, 7, 11, 13, 17, 19, 23}, comprising nine small positive integers which are 
either prime numbers or powers of prime numbers. One can obtain bases for 
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residual encoding by taking any three distinct elements of this set. This keeps the 
numbers roughly the same size and allows a total range of 5,354,228,880 which is 
sufficient for 32 bits. For example, one such base generated in this manner might be 
{16 * 9 * 11, 5 * 13 * 23, 7 * 17 * 19} = {1584. 1495. 2261}. 
5 The invention allows a system of many bases with hidden conversion 

between those bases. As well, it allows the solution constants to be exposed without 
exposing the bases themselves. The original bases used to convert the software to 
residue numbers are not required to run the software, but would be required to 
decode the software back to the original high level source code. The invention allows 

10 a set of solution constants to be created which may run the software, without 

exposing the original bases. Therefore, the solution constants are of no assistance 
to the attacker in decoding the original software, or reverse engineering it. 

To hide the conversion of a residue number, r, defined by a vector of 
remainders (r 1f r 2) ... r n ) derived using a base of pairwise relatively prime numbers (b 1; 

15 b 2 , ... b n ), a vector of solution constants are derived as follows. Firstly, using the 
method of Knuth, a vector of constants (c f , c 2 , ... c k ) may be determined which 
provides the original integer by the calculation: 

r = (r 7 c 7 + r 2 c 2 + ... + r k c k ) (mod bj (16) 
where b, is the /th number in the vector of pairwise relatively prime numbers {b u b 2t ... 

20 b n }. As each of the corresponding r„ r 2 , ... r n are residues, they will all be smaller than 
b n therefore equation (16) may be simplified to: 

r t = (c f mod £>,) * r,+ (c 2 mod b,) * r 2 + ... + (c.mod b,) * r n (17) 
Each component (c, mod b t ) will be a constant for a given basis, and can be pre- 
calculated and stored so that the residue numbers can be decoded, and the software 

25 executed, when required. Because the vector of (c, mod b) factors are not relatively 
prime, they will have common factors. Therefore, the base {b u b 2% ... b n } can not be 
solved from knowledge of this set of factors. Therefore, storing this set of solution 
constants with the encoded software does not provide the attacker with any 
information about the old or the new bases. 



30 
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Division of Residue Numbers 

Most texts like Knuth also indicate that division is impossible. However, the 
invention provides a manner of division by a constant. 

In order to perform division by a constant using residue numbers, the divisor 
5 must be one of the numbers of the base: 
Let: the base be b 2 , ... b n } t 

the divisor be b n which is a member of the set {£>,, b 2 , ... b n }, and 
the quotient be {q u q 2 , ... , q n ). 
Then, to calculate q } (where / is not y): 
10 q J = (Cj/b.modb,)* r+(c,-1) /b, mod r, (19) 

The algebraic derivation is straightforward, by symbolically performing the full 
decoding and division. The key is the observation that all the other terms vanish due 
to the construction of the c/s. 

To calculate q h the terms do not vanish, so a computation must be made of: 
15 q, = (c,/ b.modb,)* r, + ... + (c n /b, mod b,)*r n (20) 

This equation does not take account of the range reduction needed, so a 
separate computation is used to calculate the number of times the range has been 
wrapped around, so that the proper value may be returned: 
w, = [(c 7 / b, ) \r, + ... 
20 + (c n / b t ) x r n ] I (rangeSize / b t ) x (rangeSize / b,) (21 ) 

Therefore, the decoded integer value becomes: 

x = q, + (rangeSize / b) x w, (22) 
Figure 6 presents a flow chart of a simple implementation of a Residue 
Number encoding phase, in a preferred embodiment of the invention. The routine 
25 begins at step 68 by establishing a base set of pairwise relative primes, for example, 
the set of {16, 9, 5, 7, 11, 13. 17, 19, 23} as presented above. At step 70, a base is 
computed from this set as previously described, such as {1584, 1495, 2261}. A 
suitable block of software code is selected from the SSA graph and is transformed 
into residual form at step 72. If operators are found which are not calculable in the 
30 residue domain, then they will be identified in the phase control file, and those 

operators and their associated variables will be encoded using a different technique. 
At step 74, a corresponding set of solution constants is then calculated and is stored 
with the. tamper-resistant program. As noted above, these solution constants are 
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needed to execute the program, but do not provide the attacker with information 
needed to decode the tamper-resistant program. 

At step 76, a decision block determines whether the entire SSA graph has 
been traversed, and if not, the compiler steps incrementally to the next line of code by 
5 means of step 78. At step 80, a determination is made whether to select a new basis 
from the set of pairwise relative primes by returning to step 70, or to continue with the 
same set by returning to step 72. Alternatively, one could return to step 68 to create 
a completely new base set, though this would not generally be necessary. 

Once the decision block at step 76 determines that the SSA graph has been 
10 traversed, the phase is complete. 



Bit Exploded Coding 

Like the residue number coding above, the bit-exploded coding technique 
encodes one virtual register (VR) or other variable into multiple VRs or other 
15 variables. 

The idea is to convert one n-bit variable into n Boolean variables. That is, 
each bit of the original variable is stored in a separate and new Boolean variable. 
Each such new Boolean variable is either unchanged or inverted by interchanging 
true and false. This means that for a 32-bit variable, there are 2 32 , a little over 4 
20 billion, bit-exploded codings to choose from. 

This encoding is highly suitable for code in which bitwise Boolean operations, 
constant shifts or rotations, fixed bit permutations, field extractions, field insertions, 
and the like are performed. Shifts, rotations, and other bit rearrangements have no 
semantic equivalent in high-level code, since they specifically involve determining 
25 which bits participate in which Boolean operations. 

For other Boolean operations, the complement operation, which takes a 
complemented input (if unary) or two complemented inputs (if binary) and returns a 
complemented result, is clear by application of de Morgan's laws, so dealing with the 
inversion of some of the variables in the bit-exploded representation is 
30 straightforward. Recall that de Morgan's first law states that: not ((not x) and (not y)) 
= x or y, and second law states that: not ((not x) or (not y)) = x and y. In general, if 
op is a binary operation, it is desirable to use the operation op2 such that: 

x op2 y = not ((not x) op (not y)) 
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Examples would be that the complement of the and operation is or, and the 
complement of the or operation is and. The same strategy applies to other 
operations as well. 

For bit-wise Boolean operations, either the operation or its complement on 
5 each bit is performed. For example, if a 4-bit variable x has been exploded into 4 

Boolean variables a, b, c, d, with a and d uninverted and b and c inverted, then where 
y has similarly been encoded as a', b\ c\ d' and z is to be encoded similarly as a", b'\ 
c", d'\ the operation: 

z = x and y 

10 may be performed by computing: 

a" = a and a' 
b"= b orb' 
c" = c or c' 
d" = dand d' 

1 5 since the or operation is the complement of the and operation, and it is the b and c 
components of each variable which are complemented. 

This encoding results in a substantial increase in the number of operations 
relative to the original program, except for operations which can be "factored out" 
because they can be done by reinterpreting which variables represent which bits or 

20 which bits are in the representation are inverted. 

Some of this expansion may be avoided by using the optimization routine 
described hereinbelow. 

Figure 7 presents a flow chart of an exemplary implementation of bit- 
exploded encoding. The routine begins at step 82, where a variable or set of 

25 variables is identified for boolean encoding. At step 84, a corresponding set of 

boolean variables is defined for each original variable. Additional lines of software 
code are then added at step 86 to redefine the new boolean variables using shifts, 
rotations, inversions and other transforms as described hereinabove. These 
variables and their transforms are recorded in the "phantom parallel program", so that 

30 the outputs of the program can be rationalised when required. Note that variables 
which are completely internal to the program, may never be rationalised in this 
manner. 

At step 88, a decision block determines whether the entire SSA graph has 
been traversed, and if not. the compiler steps incrementally to the next variable, line 
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of code, or block of code, by means of step 90. If the entire SSA graph, or at least 
the target SSA code has been traversed, the phase is complete. 

An Optimization: Bit-Tabulated Coding 

In the bit-exploded technique described above, the resulting code may be 
excessively bulky and slow to execute. However, an optimization may be performed 
which reduces these inefficiencies. 

Bit-exploded coding may produce data-flow networks having subnetworks 
with the following properties: 

they have only a reasonably small number of inputs; and 

they are acyclic; that is, contain no loops. 

When this occurs, one can replace the entire network or subnetwork with a 
table lookup. This results from the fact that an /n-input, n-output Boolean function can 
be represented by a zero-origin table of 2 m n-bit elements. Instead of including the 
network in the final encoded program, it is simply replaced with a corresponding table 
lookup, in which one indexes into the table using the integer index formed by 
combining the m inputs into a non-negative integer, obtaining the n-bit result, and 
converting it back into individual bits. Note that the positions of the bits in the index 
and the result of the above lookup can be random, and the network can be previously 
encoded using the bit-exploded coding, so the encoding chosen for the data is not 
exposed. 

It is desirable that the number of inputs to the table be small, to keep the table 
from becoming excessively large. However, for anything up to eight inputs, and 
sometimes for as many as 12, this is a viable approach, and can result in substantial 
savings of memory space and/or increased speed in execution compared to bit- 
exploded encoding. 

Moreover, bit-tabulated encoding is compatible with the bit-exploded 
encoding, and it is preferable to combine the two techniques where opportunities 
occur. 

The Reverse Transformation: Bit-Tabulated to Bit-Exploded 

The bit-tabulation encoding is an optimization of bit-exploded coding. 
Sometimes it is useful to perform the reverse of this transformation. That is, to 
transform a table-lookup with the above-described characteristics into a network of 



WO 00/77597 PCT/CA00/00678 

- 29 - 

Boolean operations. This is straightforward, and algorithms for converting from such 
tables into such networks can be found in many books on circuit theory, for example. 
Switching Theory, by Paul E. Wood. Jr.. McGraw-Hill Book Co.. 1968, Library of 
Congress Catalog Card Number 68-11 624. 
5 An example where this reverse transformation is useful is when one wishes to 

disguise the tables. For example, one may convert from the bit-tabular form to the 
bit-exploded form, which involves the injection of random bit inversions, and then 
when optimization converts parts of the code back into bit-tabular form, the tables are 
drastically disguised and changed. Thereby, this provides an effective means for 

10 data-coding small tables used in table lookup operations. 

For example, one may hide Data Encryption Standard (DES) Keys using Bit- 
Exploded and Bit-Tabulated coding. DES is currently the most widely known and 
studied encryption algorithm. Moreover, triple-DES variants of DES continue to be 
suitable forms of encryption even in quite secure applications. 

15 The DES algorithm is well suited for a combination of the bit-exploded and bit- 

tabular encodings. By performing tamper-resistant data-encoding on a routine with 
an embedded constant key, which performs DES encryption, for example, a tamper- 
resistant software routine may be produced which still performs DES encryption, but 
for which extraction of the key is. a very difficult task. This extraction is particularly 

20 difficult it a fully-unrolled implementation is used, that is, one in which the 16 rounds 
of DES are separated into individual blocks of code instead of being implemented by 
a loop cycling 16 times. Such unrolling can easily be performed with a text editor 
prior to execution of the tamper-resistant encoding. 

This is clear from consideration of the DES algorithm. The entire DES 

25 encryption process consists of small shifts, bit permutations or bit transforms very 

similar to permutations, and lookups in small tables called S-boxes which are already 
in the ideal form for the bit-tabular to bit-exploded form mentioned above. 

For example, given a subroutine which computes DES, in which the key is 
embedded in the routine body as a constant, so that it computes DES for only this 

30 one key, and in which the loop representing the 16 'rounds' of DES has been 
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unrolled, either by unrolling it at the source level, or by applying aggressive loop 
unrolling to unroll the rounds in the code optimizer, this routine may be encoded 
according to the method of the invention as follows: 

1. The entire routine is encoded using the bit-exploded encoding, and using the 
conversion from bit-tabular to bit-exploded on the S-boxes. 

Note that the small shifts, word splits, and permutations disappear as they are 
simply re-interpretations of the identities of the Booleans. This is only true 
with the unrolled version where the shift for each round is a constant. 
At this point, the code contains excessive bulk, but may be reduced. 

2. The code produced above is now reduced using conventional constant 
folding. The effect is that the key has now completely disappeared, but the 
code bulk remains excessively large. 

3. Further encoding is now performed by recoding using the bit-exploded to bit- 
tabular optimization. 

A completely different set of S-boxes has now been produced which bears no 
discoverable relation to the original ones and correspond only to the encoded 
data. The positions of the bits, and to some extent even which part of the 
computation has been assigned to which S-box, is now radically changed. 
The same process can be used to create a routine which performs the 
corresponding decryption. 

The above method for hiding DES keys may not be particularly useful on its 
own, since an attacker with access to the encryption and decryption routines could 
simply use the routines themselves, instead of the keys, to achieve what could 
otherwise have achieved by knowing the keys. However, if DES or triple-DES is 
embedded in a larger program, use of the control-flow encoding in concert with data- 
flow encoding in a manner of the invention, makes the above technique highly useful, 
since it is then no longer possible to extract the encryption and decryption routines in 
isolation. 

There are many uses for software applications which embed and employ a 
secret encryption key without making either the key or a substitute for the key 
available to an attacker. The method of the invention can generally be applied to 
these applications. 
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Custom Base Coding 

As noted above, custom based coding provides the optimal tamper-resistance 
in view of the three targeted properties: anti-hologram, fake-robustness and 
togetherness. However, this performance is at the expense of memory and 
5 necessary processing power. Therefore, it may be desirable to only use this 

technique in certain portions of the target program, and to use techniques which are 
less demanding of system resources in other areas of the target. 

In broad terms, this coding technique is a variable transform in a custom 
coordinate space. For example, values defined on an (x, y) coordinate space could 
10 be transformed onto a (x - y, x + y) coordinate space. Such a transformation would 
give the visual impression of a 45° rotation. Of course, this coding transformation 
may also be n-dimensional, so the visual analogy to 2 dimensions is a limited 
analogy. Note that the vectors need not be orthogonal, but they must be independent 
in order to span the vector space. That is, if there are n vectors, they must form the 
1 5 basis for a n-dimensional vector space. 

For a simple example, variable "x" is grouped with some other variables such 
as "y" and "z", that may be part of the program or decoy variables that have been 
created. Then an invertible map to some other set of variables is created. This 
technique basically treats x, y, z as basis vectors in some coordinate space, and the 
20 mapping is just the change to a different basis. 

In the same manner as the polynomial and bit-transform techniques, the 
details of the custom base transformation are not required to execute the program, so 
they may be discarded once it is complete. - Therefore, there are no secrets left in the 
executable tamper-resistant program that an attacker may use to decode it. 
25 If this transform was executed on a single equation, it would be possible to 

identify what has been done, and to reverse the transformation. However, with 
multiple equations, the inverse transformation would be very difficult to calculate. As 
well, there are additional degrees of freedom which increase the complexity, and 
reduce the tracibility by orders of magnitude. For example: 
30 1 . Variables need not be grouped with other variables that have either related 
function or location. In fact, it is desirable to use disparate variables as an 
attacker would be less likely to look towards diverse and unrelated areas of 
the program for interdependency. 
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2. Decoy variables may be added to the SSA graph and included in the 
transform. A particular area of a software program, for example, the copy 
protection area, may be made the focus of this coding technique. Since the 
code in the copy protection area of the software program is rarely executed, 
this technique could be used to add 10,000 or so operations to this area. The 
user would not generally be inconvenienced by the additional fraction of a 
second it would take to execute the tamper-resistant copy protection code, or 
the 40 kilobytes or so of memory it would require. An attacker, however 
would not be able to decode these operations using traditional reverse- 
engineering techniques and would have to analyse them by hand. Therefore, 
this method would have tremendous utility for tamper-resistance. 

3. It is also straightforward to scale this coding technique to handle n- 
dimensions. This creates a large matrix of interdependent equations, spread 
throughout diverse areas of the tamper-resistant software program. This way, 
a small change in one area of the program may have consequences in many 
other areas. An attacker would not be able to identify which areas would be 
affected by a given change. 

4. The bases for the custom-base codings may be changed almost continuously 
as the tamper-resistant software is compiling, provided that the tamper- 
resistant compiler remembers the encodings, so that the operation and 
outputs of the tamper-resistant software remain coordinated. 

Figure 8 represents a simple application of this technology in a preferred 
embodiment of the invention. The routine begins at step 92, where a variable or set 
of variables is identified for custom base encoding. At step 94, decoy variables are 
added if necessary, bringing the number of variables to n. At step 96, additional lines 
of software code are then added to map these n variables onto a new n-dimensional 
space. These variables and their transforms are recorded in the "phantom parallel 
program", so that the outputs of the program can be rationalised when required. 
Note that variables which are completely internal to the program, may never be 
rationalised in this manner. 

At step 98, a decision block determines whether the entire SSA graph has 
been traversed, and if not, the compiler continues to analyse the SSA graph, by 
means of step 100. When the entire SSA graph, or the at least the target SSA code 
has been traversed, the phase is complete. 
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Choosing Random Numbers 

For all the coding schemes, a large number of random numbers are required. 
For repeatability to aid debugging, Pseudo-Random numbers may be 
advantageously used. Given that a large number of random numbers are required 
5 and are used in many ways, truly random numbers such as those produced from 
radioactive decay, are not necessary, but would offer increased tamper-resistance. 
Presently, computer peripheral devices for the generation of truly random bits using 
random electronic fluctuations are commercially available. 

The more interesting question is how to pick the coefficients and bases for the 
10 various codings. The particulars of those selection strategies are outlined in the 
discussion of the techniques themselves. 

Preferred Implementation 

It is not sufficient merely to pick random codings, but the codings must be 
1 5 selected and coordinated so that each producer and consumer agree on the 

interpretation/coding at every point. As described above, there are instances where 
the program is such that a given selection will not nicely line up everything and a new 
coding must be selected using a Recode operation. 

There are many different ways to implement the invention, keeping in mind 
20 that the goal is to minimize the times that data appear u in the plain" and to avoid 

outputting the magic numbers into the scrambled program. One very simple way is to 
divide the work into several phases, first assigning codings, then actually perform the 
changes. An example of such as implementation is presented in the flow chart of 
Figures 9a and 9b, which presents the following steps: 
25 1 . Compile the original program into static single assignment form at step 102 of 
Figure 9a. As noted above, it is prefered to execute this steps using a 
standard compiler front end suitable to the application. 

2. Optionally, optimize the intermediate code at step 104. 

3. Walk the SSA graph to gather constraints at step 106. Examples of such 
30 constraints would include: 

a. identifying ll merge" nodes. In static single assignment a merge node 
does nothing, but requires that all its input/output have the same 
coding; 
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c. 



b. 



if a divide by a constant is chosen to be coded using the residue 
number technique, then the divisor must be part of the base; and 
identifying input and output variables. 



4. 

5 



10 

5. 

15 

6. 

20 



7. 

25 

8. 

30 

9. 



For each of the input and output variables, assign any pre-defined coding at 
step 108. Also, variables whose values are inherently exposed may also be 
Null coded. For example, if the outcome of a comparison will either be True 
of False, it is difficult to hide the behaviour of any boolean branch which 
employs it, so there is no advantage in tamper protecting it. There may 
however, be instances where there is an advantage to encoding such a 
comparison, for example, if the control flow is to be encoded in some manner. 
As noted above, it is preferred to perform the tamper-resistant techniques in 
many phases to reduce possibility of error and improve the ease of trouble- 
shooting. Steps 110 through 116 are performed until each desired phase has 
been completed, which is determined at the decision point 110. As noted 
above, the coordination of the phases is administered by a "phase control 
file". 

Walk the S.S.A. graph at step 112 to propagate a proposed set of virtual 
register codings into a phantom parallel program. If a virtual register has a 
coding, then examine its producer operation and consumer operations to 
propagate the encoding and generate new encodings where required. 
When the first virtual register reaches an operation, assign coding for that 
operation, which will usually assign codings to all its input/output virtual 
registers. 

The decision block at step 114 identifies inconsistencies or unallowable 
conditions in a proposed encoding which would cause it to be disallowed. In 
such a circumstance, control passes to step 116 to propose and analyse a 
new coding. If a coding is allowed, control passes back to step 110 for the 
next phase to be performed. 

For operations that are left, some random, but allowable, coding may be 
chosen and propagated to its input and output virtual registers at step 118 of 
Figure 9b. 

For each virtual register, which now all have a coding stored in the phantom 
parallel program, generate a new set of virtual registers to contain the coded 
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values at step 120. For codings like Custom Base, several original virtual 
registers will map into the same set of new virtual registers. 

10. For each operation, gather the associated input and output virtual registers 
and the corresponding new coded virtual registers at step 122. Expand the 

5 operation into whatever is required. In the preferred embodiment, this is done 

using a dedicated language to help perform the mapping between original and 
coded virtual registers, but that is merely a matter of programming 
convenience. 

1 1 . The tamper-resistant intermediate code is then compiled into tamper resistant 
10 object code using a standard compiler back end 32. As a refinement, prior to 

the conversion to a specific executable object code in the back end 32, one 
may take individual instructions and move each to one or more new locations, 
where permitted by their data flow and control flow dependencies. This 
increases the extent to which the encoded software exhibits the togetherness 
15 and anti-hologram properties. 

The preferred routine is then complete. 

While particular embodiments of the present invention have been shown and 
described, it is clear that changes and modifications may be made to such 
embodiments without departing from the true scope and spirit of the invention. For 

20 example, rather than using the encoding techniques described, alternate techniques 
could be developed which dissociate the observable execution of a program from the 
code causing the activity. 

It is understood that as de-compiling and debugging tools become more and 
more powerful, the degree to which the techniques of the invention must be applied 

25 to ensure tamper protection, will also rise. As well, the concern for system resources 
may also be reduced over time as the cost and speed of computer execution and 
memory storage capacity continue to improve. 

These improvements will also increase the attacker's ability to overcome the 
simpler tamper-resistance techniques included in the scope of the claims. It is 

30 understood, therefore, that the utility of some of the simpler encoding techniques that 
fall within the scope of the claims, may correspondingly decrease over time. That is, 
just as in the world of cryptography, increasing key-lengths become necessary over 
time in order to provide a given level of protection, so in the world of the instant 
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invention, increasing complexity of encoding will become necessary to achieve a 
given level of protection. 

As noted above, it is also understood that computer control and software is 
becoming more and more common. It is understood that software encoded in the 
5 manner of the invention is not limited to the applications described, but may be 
applied to any manner of the software stored, or executing. 

The method steps of the invention may be embodiment in sets of executable 
machine code stored in a variety of formats such as object code or source code. 
Such code is described generically herein as programming code, or a computer 
10 program for simplification. Clearly, the executable machine code may be integrated 
with the code of other programs, implemented as subroutines, by external program 
calls or by other techniques as known in the art. 

The embodiments of the invention may be executed by a computer processor 
or similar device programmed in the manner of method steps, or may be executed by 
15 an electronic system which is provided with means for executing these steps. 

Similarly, an electronic memory means such computer diskettes, CD-Roms, Random 
Access Memory (RAM), Read Only Memory (ROM) or similar computer software 
storage media known in the art, may be programmed to execute such method steps. 
As well, electronic signals representing these method steps may also be transmitted 
20 via a communication network. 

It would also be clear to one skilled in the art that this invention need not be 
limited to the existing scope of computers and computer systems. 

Credit, debit, bank and smart cards could be encoded to apply the invention to 
their respective applications. An electronic commerce system in a manner of the 
25 invention could for example, be applied to parking meters, vending machines, pay 
telephones, inventory control or rental cars and using magnetic strips or electronic 
circuits to store the software and passwords. Again, such implementations would be 
clear to one skilled in the art, and do not take away from the invention. 
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WHAT IS CLAIMED IS: 

1 . A method of increasing the tamper-resistance and obscurity of computer 
software code comprising the steps of: 

transforming the data flow in said computer software code. to dissociate the 

observable operation of the transformed said computer software code from 
the intent of the original software code. 

2. A method of increasing the tamper-resistance and obscurity of computer 
software code comprising the steps of: 

encoding said computer software code into a domain which does not have a 

corresponding semantic structure, to increase the tamper-resistance and 
obscurity of said computer software code. 

3. A method as claimed in claim 2, wherein said step of encoding comprises: 
transforming the domains of individual operations in said software code, and of the 

data used by and computed by said individual operations in said software 
code, so that each individual operation, together with the data which it uses 
and the data which it computes, occupies a different data domain from the 
data domains of such operations and data in the original software code, and 
so that the original of said operations and the original of said data may not be 
readily deducible from the transformed versions of said operations and the 
transformed versions of said data. 

4 a method as claimed in claim 2 wherein said step of encoding comprising: 
encoding arguments in said computer software code into a domain which does not 
have a corresponding high level semantic structure, to increase the tamper- 
resistance and obscurity of said computer software code. 

5. A method as claimed in claim 2 wherein said step of encoding comprises: 
dispersing the definition of an argument into a plurality of locations, to dissociate the 

observable operation of said computer software code from said compuer 

software code while being executed. 
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6. A method as claimed in claim 5, further comprising the subsequent step of: 
moving selected individual instructions to new locations permitted by their data flow 

and control flow dependencies. 

7. A method as claimed in claim 6, wherein said step of dispersing comprises: 
redefining an argument using one of the following techniques: 

residue coding; 
bit-explosion; 
bit-residue; or 
custom base coding. 

8. A method as claimed in claim 2 wherein said step of encoding comprises: 
encoding said computer software code such that minor changes will result in 

nonsensical operation when the encoded software is executed, without 
causing the encoded software .to immediately fail. 

9. A method as claimed in claim 8 further comprising the step of: 

adding code to said computer software code to allow variables to have a broader 
range of values without causing out of range errors. 

10. A method as claimed in claim 9, wherein said steps of encoding and adding 
code comprise: 

redefining an argument using one of the following techniques: 
polynomial coding; 
residue coding; 
bit-explosion; 
bit-residue; or 
custom base coding. 

11. A method as claimed in claim 2 wherein said step of encoding comprises: 
defining a first variable in said computer software code in terms of a second variable 

in said computer software code, so that modification of said second variable 
modifies the value' of said first variable. 



WO 00/77597 PCT/CA00/00678 

- 39 - 

12. A method as claimed in claim 2 wherein said step of encoding comprises: 
defining a plurality of variables in terms of one another, so that modification of any 

one of said variables will alter the definition of all of said plurality of variables. 

13. A method as claimed in claim 12, wherein said step of defining comprises: 
redefining an argument using one of the following techniques: 

polynomial coding; 
residue coding; 
bit-explosion; 
bit-residue; or 
custom base coding. 

14. A method as claimed in claim 2 wherein said step of encoding comprises: 
responding to a line of code defining a polynomial equation by: 

redefining each variable in said polynomial equation by a new polynomial 
equation; and 

selecting random values of constants in said new polynomial equations. 

15. A method as claimed in claim 14 wherein said step of selecting comprises 
selecting values of constants in said new polynomial equations to invert the sense of 
an arithmetic operation in said polynomial equation. 

16. A method as claimed in claim 15 wherein: 

said step of redefining comprises redefining each variable in said polynomial 

equation by a new first order polynomial equation; and 
said step of selecting comprises selecting values of constants in said new first order 

polynomial equations to invert the sense of an arithmetic operation in said first 

order polynomial equation. 

17. A method as claimed in claim 2 wherein said step of encoding comprises: 
generating and storing a set of relatively prime factors; and 

transposing said computer software program by calculating residues based on said 
set of relatively prime factors. 
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18. A method as claimed in claim 17, further comprising the steps of: 
calculating a corresponding set of execution constants which may be used to execute 

said encoded computer software program; and 
storing said set of execution constants with said encoded computer software 
program. 

19. A method as claimed in claim 18 wherein said step of transposing comprises 
selecting a block of SSA code and transposing said block of SSA code into a 
corresponding set of residual code by calculating residues based on said set of 
relatively prime factors. 

20. A method as claimed in claim 2, wherein said step of encoding comprises: 
defining an n-bit variable as a corresponding set of n-boolean variables. 

21 . A method as claimed in claim 20, further comprising the step of: 

adding lines of code to invert selected ones of said corresponding set of n-boolean 
variables. 

22. A method as claimed in claim 21 , further comprising the step of: 
responding to the data flow of said computer software code having a reasonably 

small number of inputs and being acyclic, by replacing said corresponding set 
of n-boolean variables with a table lookup. 

23. a method as claimed in claim 2 wherein said step of encoding comprises: 
mapping a set of n-variables into a new n-dimensional, custom coordinate space. 

24. A method as claimed in claim 23 wherein said step of mapping comprises, 
mapping a set of n-independent variables into a new n-dimensional coordinate space 

defining a rotation of said set of n-independent variables from the original 
coordinate space. 

25. A method as claimed in claim 2, wherein said step of encoding comprises: 
encoding intermediate computer software code into tamper-resistant intermediate 

computer software code having a domain which does not have a 
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corresponding semantic structure, to increase the tamper-resistance and 
obscurity of said computer software code; 
and further comprising: 

a prior step of compiling said computer software program from source code into a 
corresponding set of intermediate computer software code; and 

a subsequent step of compiling said tamper-resistant intermediate computer software 
code into said tamper-resistant computer software object code. 



26. A computer readable memory medium, storing computer software code 
executable to perform the steps of: 

compiling said computer software program from source code into a corresponding set 
of intermediate computer software code; 

encoding said intermediate computer software code into tamper-resistant 

intermediate computer software code having a domain which does not have a 
corresponding semantic structure, to increase the tamper-resistance and 
obscurity of said computer software code; and 

compiling said tamper-resistant intermediate computer software code into tamper- 
resistant computer software object code. 

27. A computer data signal embodied in a carrier wave, said computer data signal 
comprising a set of machine executable code being executable by a computer to 
perform the steps of: 

compiling said computer software program from source code into a corresponding set 
of intermediate computer software code; 

encoding said intermediate computer software code into tamper-resistant 

intermediate computer software code having a domain which does not have a 
corresponding semantic structure, to increase the tamper-resistance and 
obscurity of said computer software code; and 

compiling said tamper-resistant intermediate computer software code into tamper- 
resistant computer software object code. 
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28. An apparatus for increasing the tamper-resistance and obscurity of computer 
software code, comprising: 

front end compiler means for compiling said computer software program from source 
code into a corresponding set of intermediate computer software code; 

encoding means for encoding said intermediate computer software code into tamper- 
resistant intermediate computer software code having a domain which does 
not have a corresponding semantic structure, to increase the tamper- 
resistance and obscurity of said computer software code; and 

back end compiler means for compiling said tamper-resistant intermediate computer 
software code into tamper-resistant computer software object code. 
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